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Abstract 

A scattering transform defines a signal representation which is in- 
variant to translations and Lipschitz continuous relatively to deforma- 
tions. It is implemented with a non- linear convolution network that 
iterates over wavelet and modulus operators. Lipschitz continuity lo- 
cally linearizes deformations. Complex classes of signals and textures 
can be modeled with low-dimensional affine spaces, computed with 
a PCA in the scattering domain. Classification is performed with a 
penalized model selection. State of the art results are obtained for 
handwritten digit recognition over small training sets, and for texture 
classification. 

1 Introduction 

Affine space models are simple to compute with a Principal Component Anal- 
ysis (PCA) but are not appropriate to approximate signal classes that include 
complex forms of variability. Image classes are often invariant to rigid trans- 
formations such as translations or rotations, and include elastic deformations, 
which define highly non-linear manifolds. Textures may also be realizations 
of strongly non-Gaussian processes that cannot be discriminated with linear 
models either. 

Kernel methods define distances d(f,g) = \\&(f) — &(g)\\, with opera- 
tors $ which address these issues by mapping / and g into a space of much 
higher dimension. However, invariance properties and learning requirements 
on small training sets, rather suggest to implement a dimensionality reduc- 
tion. 
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If $ has appropriate invariants and linearizes small deformations then 
this paper shows that affine spaces become powerful classification models in 
the transformed domain. Suppose that f(x) is translated and deformed into 
D T f(x) = f(x — t(x)). Let |Vr(x)| be sup matrix norm of the deforma- 
tion tensor Vt(x). To linearize small deformations, we impose the following 
Lipschitz regularity condition: 

\mf)-<f>(D T f)\\ <C H/ll sup|Vr(;r)| . (1) 

X 

A small signal deformation yields a metric modification which is bounded 
by the deformation amplitude. A Fourier transform modulus <&(/) = |/| is 
invariant to translation but does not satisfy ([I]) because high frequencies are 
severely modified by small deformations. Existing invariant representations 
do not seem either to satisfy this property. Localized transforms such as 
wavelet transforms are stable relatively to local deformations but are not 
translation invariant. 

Scattering operators constructed in pJH, [T2] . are invariant to translations 
and Lipschitz continuous relatively to local deformations, up to a log term. 
These scattering operators <3> create invariants by representing signal high 
frequencies with interference coefficients. This paper models complex signal 
classes with low- dimensional affine spaces in the scattering domain, which 
are computed with a PCA. The classification is performed by a penalized 
model selection. 

Scattering operators may also be invariant to any compact Lie subgroup 
of GL(M?), such as rotations, but we concentrate on translation invariance, 
which carries the main difficulties and already covers a wide range of classifi- 
cation applications. Section[2]reviews the construction of scattering operators 
with a cascade of wavelet transforms and modulus operators, which defines 
a non-linear convolution network [8]. Section [3] shows that learning affine 
scattering model spaces has a linear complexity in the number of training 
samples. Section 0] describes state of the art classification results obtained 
from limited number of training samples in the MNIST hand-written digit 
database, and for texture classification in the CUREt database. Softwares 
are available in "www.cmap.polytechnique.fr/scattering". 
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2 Scattering Transforms 



In order to build a representation which is Lipschitz continuous to deforma- 
tions, a scattering transform begins from a wavelet representation. Transla- 
tion invariance is obtained by progressively mapping high frequency wavelet 
coefficients to lower frequencies, with modulus operators described in Section 
12.11 Scattering operators iterate over wavelet modulus operators. Section [272] 
shows that it defines a translation invariant representation, which is Lipschitz 
continuous to deformation, up to a log term. A fast computational algorithm 
is given in Section 12.31 

2.1 Wavelet Modulus Propagator 

This section explains how to represent signal high frequencies with lower 
frequency interference terms, computed with a wavelet transform modulus. 

A wavelet transform extracts information at different scales and orienta- 
tions by convolving a signal / with dilated bandpass wavelets ip 1 having a 
spatial orientation angle 7 G T: 

Wj„f(x) = f-k^jnix) with ip jl7 (x) = 2~ 2j ^(2 j x). 

At the largest scale 2 J , low- frequencies are carried by a lowpass scaling func- 
tion <p: Ajf = f*(f)j, with (f)j(x) = 2~ 2J <p(2- J x) and / <j>(x) dx = 1. The 
resulting wavelet representation is 

Wjf = {Ajf, W ,../} 

The norm of the wavelet operator is defined by 

\\Wjfr = \\f^j\\ 2 + Yl iiwwii 2 ( 2 ) 

j<J,7er 

with ||/|| 2 = / \ f(x)\ 2 dx and it satisfies 

(i-wir<ii^/ii 2 <imi 2 (3) 

if and only if for all w6R 2 , 

l-5<|<K2 J u,)| 2 + i Y, (|^(2^)| 2 + I^(-2M| 2 )<1- (4) 

i<J,7er 
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We consider families of complex wavelets 

where 9 1 (x) are low-pass envelops. Separable complex wavelets may be con- 
structed from the analytical part of wavelets defining orthonormal bases [T2] , 
in which case 5 = and the wavelet transform is unitary Oriented Gabor 
functions are other examples of complex wavelets, obtained with a modu- 
lated Gaussian ?/>(x) = e^' x e~' x ' //< - 2<J which is rotated with i? 7 by an angle 
7T7/|r|: ip^x) = ip(Ryx). In numerical experiments, we set £ = 37r/4, a = 1, 
|T| = 6, and is also a Gaussian with o = 2/3. It satisfies (jl]) only over a 
finite range of scales. 

If f T {x) = f(x — t) then 

W j>7 f T (x) = W jn f(x - t) « W hl f{x) 

if and only if |r| <C 2 j , because Wj^f has derivatives of amplitude propor- 
tional to 1~ 3 . High frequencies corresponding to fine scales are thus highly 
sensitive to translations. 

Translation invariance is improved by mapping high frequencies to lower 
frequencies with a complex modulus operator. Since ^ 7 (x) = e l ^ x 9 7 (x), we 
verify that 

ir„../V) <*■■'■(/,.*(),..(■>■)). 

where £ ii7 = 2"^ 7 , 9 jtJ (x) = 2- 2 ^ 7 (2^x) and f jn (x) = e^ x f(x). Wavelet 
coefficients Wj a f{x) are located at high frequencies because of the e^ j '~* x 
term. These oscillations are removed by a modulus operator 

\W jn f\ = \fj tl *9 jn (x)\ • ( 5 ) 

The energy of |VKj )7 /| is now mostly concentrated in the low frequency do- 
main covered by the envelop 9j t7 (co) = # 7 (2 J u;). It may however also in- 
clude some high frequencies produced by the modulus singularities where 
/i,7*% 7 (^) = 0. Using complex wavelets is important to reduce the number 
of such singularities and thus concentrate the information at low frequencies. 

If f(x) = J2n a n cos ( UJ nx) then one can verify that \Wj n f(x)\ = Cj n + 
Bj j(x) where £j, 7 (a?) is an interference term. It is a combination of the 
cos(o; n — u n >)x, for all co n and u n > in the support of ^ 7 (2 J o;). The modu- 
lus yields interferences that depend upon frequency intervals, but it loses the 
exact frequency locations u n in each octave. 
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A complex modulus is applied to all wavelet coefficients, but not to the 
low frequencies Ajf, which defines a wavelet modulus propagator: 

Ujf = {Ajf,\W^f\} 

Since ||a| — < \a — b\ and the wavelet transform is contractive, it results 
that 

\\Ujf-Ujg\\ < \\Wjf-Wjg\\ < \\f-g\\ 
and \\Ujf\\ = \\f\\ if 5 = in©. 

2.2 Multiple Paths Scattering 

A scattering operator iterates on the propagation operator Uj [TT], and de- 
fines a convolutional network. It is contractive, translation invariant and 
Lipschitz continuous to deformations, up to a log factor. 

A scattering operator is defined along a path p = {(j n , 7 n )}n<| P | which is a 
family of wavelet indices. It computes \p\ wavelet convolutions and modulus 
along this path: 

S(p)f = | • • • | / * V>ii,7l I * V'ia.Tal ■ ■ ■ I * ^,,71,1 1 > 

IpI 

with ] n < J and 7„ G V. The scattering output at the scale 2 J is 

Sj(p)f = S(p)f*<j ) j. 

One can verify that scattering coefficients for paths of length m + 1 are 
computed by applying the wavelet modulus propagator U j to scattering co- 
efficients for all paths p of length \p\ = m: 

{UjS(p)f} pM=m = {Sj(p)f} pM=m U {S(p)f} pM=m+l . (6) 

A scattering transform is thus obtained by cascading the propagator Uj, 
which defines a convolution network illustrated in Figure UJ Computing Ujf 
generates a first layer of transformed signals S(p)f = \ f*ipj,y\ for p = {j, 7}, 
and transforms / into Sj(0)f = f*<f)j- Suppose that all scattered signals 
S(p)f have already been computed for \p\ < m. According to (jUJ), the next 
layer m + 1 is calculated by applying Uj to each S(p)f on the m th layer where 
\p\ = m. It also transforms the m th layer of S(p)f into Sj(p)f = S(p)f *(fij. 
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The scattering operator Sj(p)f is thus computed along all possible paths 
with a convolutional network. Convolution networks are general computa- 
tional architectures introduced by LeCun [8], that involve convolutions and 
non-linear operators. They are usually constructed with deep-learning back- 
propagation algorithms [5] to learn the filter coefficients. They have been 
succesfully applied to number of recognition tasks |8J and proposed as mod- 
els for visual perception [31 [To]. In this case, the filters are dilated wavelets 
which are not learned, and the non-linearities are modulus operators. 

As opposed to standard layered neural networks which output the last 
layer, all nodes of a scattering network output a signal Sj(p)f which is used 
for classification. Depending upon \p\, Sj(p)f gives different type of infor- 
mation on /. For \p\ = 0, Sj(0)f — f*4>j averages the signal. For \p\ = 1, 
Sj(p)f = l/^^jiml ^'fij i s an averaged wavelet modulus, which tends to the 
L 1 norm of wavelet coefficients when J increases. For \p\ > 2, it provides 
higher order interferences between the signal components along different ori- 
entations and frequency octaves. 

For appropriate complex wavelets, one can prove [T2] that the energy 
5^| p | =m || Sj(p)f || 2 of a scattering layer m tends to zero as m increases. This 
decay is fast. Numerically the maximum network depth is typically limited 
to m = 3. 



Sj(pi)f = \f*<ipj ini \*(f>j 

v\ UjS( Pl )f 



£ 


\SjiP2)f = ||/*^ii,7il * ^2,72 1 *<^J 


Sj( P3 )f = 


II/* ,71 


*V>32, 72 l*V'j3,73l*0J r 



=3 



Figure 1: A scattering transform implements a layered convolution network 
which iterates on a wavelet modulus propagator Uj. 
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The scattering metric for $ = Sj is obtained with a summation over all 
paths p: 

\\Sjf - Sjg\\ 2 = J2\\Sj(p)f ~ Sj(p)g\\\ 

p 

where ||S'j(p),/'|| 2 = / \ Sj(p)f(x) \ dx. Since Sj is calculated in (jSJ) by iterating 
on the contractive propagator Uj, it results that it is also contractive [10] 

\\Sjf-S jg \\ 2 <\\f-g\\ 2 . 

Scattering operators are not only contractive but also preserve the norm. For 
appropriate complex wavelets which satisfy (J4j) for 5 = 0, one can prove [12] 
that \\Sjf\\ = \\f\\. 

When a signal is translated f T (x) = f(x — r), the scattering transform is 
also translated 

Sj(p)f T (x) = Sj(p)f(x - t) 

because it is computed with convolutions and modulus. However, when J 
increases, Sj(p)f(x) tends to a constant because of the convolutions with 
<pj. It thus becomes translation invariant and one can verify [12] that the 
asymptotic scattering metric is translation invariant: 

lim \\Sjf-Sjf T \\=0. 

J— >oo 

For classification the key scattering property is its Lipschitz continuity to 
deformations D T f(x) = f(x — t(x)). Let |r|oo = sup x |r(x)| and |Vt|oo = 
sup x |Vr(x)| < 1, where |Vr(x)| is the matrix sup norm of Vr(x). Along 
paths of length \p\ < m , one can prove [12] that for all 2 J > It^/I Vr^ 
the scattering metric satisfies 

\\SjD T f - Sjf\\ <Cm ll/H IVtU log T ^%- . (7) 

The scattering operator is thus Lipschitz continuous to deformations, up to a 
log term. It shows that for sufficiently large scales 2 J , the signal translations 
and deformations are locally linearized by the scattering operator. 

2.3 Fast Scattering Algorithm 

Fast scattering computations are possible because the scattering energy ||Sj(p),/'|| 2 
is highly concentrated along a small set of paths. A scattering transform is 
calculated along these paths with an O(N) algorithm. 
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Section 12.11 explains that \f * ipj a \ has an energy mostly located over 
lower frequencies. As a result \f * i[)j t7 \ *4>j\y is negligible if f < j. The 
scattering energy is thus concentrated on progressive paths p = {j n ,Jn}n<\ P \ 
which satisfy j n+ i > j n . Since j n < J, progressive paths have a length 
\p\ < J. Since || -Sj- (p)/ 1| tends to zero as \p\ increases, progressive paths 
are computed up to a maximum length \p\ < m , which is typically 3 in 
numerical applications. If there are |T| different mother wavelets ^ 7 then the 
number of progressive paths p of length \p\ < tuq is 0(J m °\T\ mo ). 

The scattering transform is implemented along progressive paths for \p\ < 
mo, by iterating on wavelet transforms and modulus operators, with the 
layered network illustrated in Figure [TJ Each S(p)f is computed with wavelet 
convolutions and is sampled at intervals A2 J ipi where 2 J ipi is the scale of the 
last wavelet on the path. The oversampling factor is typically A = 1/2 to 
avoid aliasing. The averaged signal Sj{p)f{n) = S(p)f*(j)j(n) is sampled at 
intervals A2 J . The addition of a new element (j, 7) at the end of a path p is 
written p + (j, 7). 

Algorithm 1 Progressive scattering calculations 
Set S(0)f[n] = f[n], M =0 
for all m < m do 

for all p with \p\ = m do 
for all j > j\ p t and 7 G T do 

S(p+U,-y))f[n] = |%)/*^>A2^w]| 
end for 

Sj(p)f[n] = S(p)f^M^2 J -^] 
end for 
end for 

for all p with \p\ = itlq do 

Sj(p)f[n] = S(p)f*<Pj[n^2 J -^] 
end for 

Output: {Sj(p)f} ]p \ 



One can verify that the scattering signals Sj(p)f along all progressive 
paths of length \p\ < mo include 0(J mo \T\ m °N2~ 2J ) coefficients. The com- 
putational cost is driven by the number of operations to compute the sub- 
sampled wavelet transform convolutions along |T| orientations. With a fast 
filter bank algorithm, it requires 0(|r|A) operations for a signal of size N. 
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The overall complexity of this algorithm is thus 0(N4 m °|r| m °). Direct con- 
volution calculations with FFTs bring in an extra factor log N. 

3 Classification 

Translation invariance and Lipschitz regularity to local deformations lin- 
earize small deformations. Signal classes can thus be approximated with low- 
dimensional affine spaces in the scattering domain. Although the scattering 
representation is implemented with a potentially deep convolution network, 
learning is not deep and it is reduced to PCA computations. The classifica- 
tion is implemented with a penalized model selection. 

3.1 Affine Scattering Space Models 

A signal class C can be modeled as a realization of a random process F. There 
are multiple sources of variability, due to the reflectivity of the material as 
in textures, due to deformations or to various illuminations. Illumination 
variability is often low-frequency and can be approximated in linear spaces 
of dimension close to 10 pQ. This property remains valid in the scattering 
domain. A scattering operator also linearizes local deformations and reduces 
the variance of large classes of stationary processes. One can thus build a 
linear affine space approximation of SjF. A scattering transform SjF along 
progressive paths of length \p\ < m is a vector of size O(N), which may be 
much smaller then N if J is large. 

The affine space of dimension k which minimizes the expected pro- 
jection error E{\\SjF - P Ak (SjF)\\ 2 } is 

A fc = fij + V k (8) 

where p,j(p, x) = E{Sj(p)F(x)} and is the space generated by the first k 
eigenvectors of the covariance operator of Sj(p)F(x). The space dimension 
k is limited to a maximum value K. 

These affine space models are estimated by computing the empirical aver- 
age and the empirical covariance of Sj(p)f(x), for all training signals / £ C. 
The empirical covariance is diagonalized to estimate the K eigenvectors of 
largest eigenvalues. Under mild conditions [17] ; the sample covariance ma- 
trix H converges in norm to the true covariance when the number of training 
signals is of the order of the dimensionality of the space where SjF belongs. 
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Dimensionality reduction is thus important to learn affine space models from 
few training signals. 

Algorithm 2 Learning affine models for C 
for each training signal / G C do 

compute Sjf 
end for 

Compute the empirical average fx and covariance X. 

Compute with a thin SVD the K eigenvectors of E of largest eigenvalues. 



The computational complexity to estimate affine space models is dom- 
inated by eigenvectors calculations. To compute the first K eigenvectors, a 
thin SVD algorithm requires O (T K N) operations, where T is the number 
of training signals. 

3.2 Linear Model Selection 

Let us consider a classification problem with several classes {Ci}i<i</. We 
introduce a classification algorithm which selects affine space models by min- 
imizing a penalized approximation error. 

Each class Ci is represented by a family of embedded affine spaces A*^ = 
Ai+Vfc^, where V^i is the space generated by the first k eigenvectors {e^i}i<k 
of the empirical covariance matrix Ej. For a fixed dimension k, a space A^ 
is discriminative for / e Ci if the projection error of Sjf in A^ is smaller 
than its projection in the other spaces Afc^: 

Vi' , \\Sjf-P Ak .,(Sjf)\\ 2 >\\Sjf-P Ak .(Sjf)\W 

with 

k 

\\Sjf - p Ak A s jf)W 2 = W s jf - Adl 2 - ^\(Sjf - k , e hl )\ 2 . 

1=1 

Model selection for classification is not about finding an accurate ap- 
proximation model as in model selection for regression but looks for a dis- 
criminative model [2]. If Sjf for / e Ci is close to the class centroid fii 
then low- dimensional affine spaces are highly discriminative even if the 
remaining error is not negligible, because it is unlikely that any other low- 
dimensional affine space A^i yields a comparable error. If / is an "outlier" 



10 



which is far from the centroid /tj then a higher dimensional approximation 
space Afc j is needed for discrimination. It is therefore necessary to adjust 
the dimensionality of the discrimination space to each signal /. The class 
index i of / is estimated by adjusting the dimension k of the space that 
yields the best approximation, with a penalization proportional to the space 
dimension k [2]: 

Kf) = argminmin||S'j/-P Afci (S'j/)|| 2 + /3A; . 



Algorithm 3 Classification of / 
compute Sjf 
for each class Ci do 

L{%) = min < fc <x \\Sjf - P Ak ASjf)\\ 2 + fa. 
end for 

Kf) = ar g min i</ L( S) 



The minimum penalized energy L(i) is computed from the K inner prod- 
ucts of Sjf with the eigenvectors {e it i}i<K that generate the embedded 
afline spaces Aj,^ for k < K. The overall computational complexity is thus 
O(KIN). 

This classification algorithm depends upon the penalization factor /3 and 
the scale 2 J of the scattering transform. These two parameters are optimized 
with a cross-validation mechanism. It minimizes a classification error com- 
puted on a validation subset of the training samples, which does not take 
part in the afline model learning. 

• Increasing the scale 2 J reduces the intra-class variability of the repre- 
sentation by building invariance, but it can also reduce the distance 
across classes. The optimal size 2 J is thus a trade-off between both. 

• The penalization parameter /3 is similar to a threshold on \(SjF — 
fai, &i,k)\ 2 ■ The model increases the dimension k of the approximation 
space if the inner product is above (3. Increasing (3 thus reduces the 
dimension of the affine model spaces, which is needed when the training 
sequence is small. 
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4 Classification Results and Analysis 



This section presents classification results for handwritten digit recognition, 
and for texture discrimination with illumination variations. The scattering 
transform is implemented with the same Gabor wavelets along |T| = 6 ori- 
entations for both problems, and the maximum scattering length is limited 
to mo = 2. 

4.1 Handwritten Digit Recognition 

The MNIST hand- written digit database provides a good example of classifi- 
cation with important deformations. Table [TJ compares scattering classifica- 
tion results for training sets of variable size, with results obtained with deep- 
learning convolutional networks [H], which currently have the best results. 
Table [1] compares the PCA model selection algorithm applied on scattering 
coefficients and an SVM classifier with polynomial kernel whose degree was 
optimized, also applied on scattering coefficients. Cross validation finds an 
optimal scattering scale J = 3, which corresponds to translations and defor- 
mations of amplitude about 2 J = 8 pixels, which is compatible with observed 
deformations on digits. 

Below 5 10 3 training examples, a PCA scattering classifier provides state 
of the art results. It yields smaller errors than deep-learning convolution 
network which require large training sets to optimize all network parame- 
ters with backpropagtion algorithms. For 60 10 3 training samples, the deep- 
learning convolution network error [6] is below the scattering classifier error. 
Table [JJ shows that applying a linear SVM classifier over the scattering trans- 
form degrades the results relatively to a PCA classifier up to large training 
sets, and it requires much more computations. This is an indirect validation 
of the linearization properties of the scattering transform. 

The last column of Table [Hgives the average dimension k of the best linear 
approximation spaces ti selected by the classifier. It mostly increases with 
the training size because the estimation of high dimensional model spaces 
requires more training samples. For small training sets, the variance on 
the PCA eigenvalue estimation is large, which is taken into account by the 
cross-validation which increases the penalization parameter /3 to select lower 
dimensional model spaces. 

Figure [2] shows the relative approximation error when approximating a 
signal class with an affine model in the scattering domain. For digits i = 1 
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Table 1: Percentage of error as a function of the training size for MNIST. 
Minimum errors are in bold. The last column gives the average model space 
dimension k. 



Training 



ConvNets[Tl] Scatt+SVM Scatt+PCA 



k 



300 

1000 

2000 

5000 

10000 

20000 

40000 

60000 



7.18 21.5 5.93 

3.21 3.06 2.38 

2.53 1.87 1.76 

1.52 1.54 1.27 

0.85 1.15 1.2 

0.76 0.92 0.9 

0.65 0.85 0.86 

0.53 0.7 0.74 



25 

75 

130 

85 

100 

130 

100 

100 



and i = 4, it gives the average Intra-class approximation error of SjFi with 
a space of the same class, function of k: 



It is compared with 



Out(i) = 







E{\\SjF,\\ 2 


i^i'} 



which is the average Outer-class approximation error produced by the spaces 
Akj over all samples SjFv belonging to different classes i' ^ i. The intra- 
class error decay is much faster than the outer-class error decay for k < 10, 
which shows the discrimination ability of low dimensional affine spaces. For 
k > 10, intra-class versus outer-class distance ratio In/Out is approximatively 
10 -2 and 10 _1 respectively for the digits % = 1 and % = 4. It shows the 
discrimination power of these affine models, and the much larger intra-class 
variability for hand- written digits 4 than for hand- written digits 1. 

The US-Postal Service set is another handwritten digit dataset, with 7291 
training samples and 2007 test images 16 x 16 pixels. The state of the art 
is obtained with tangent distance kernels [I]. Table [2] gives results with a 
PCA model selection on scattering coefficients and a polynomial kernel SVM 
classifier applied to scattering coefficients. The scattering scale was also set 
to J = 3 by cross-validation. 




E{\\SjF i -P Ak .S J F i 



} 
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Figure 2: Relative Intra-class In and average Outer-class Out approximation 
error for the digits i = 1 and i = 4. 

Table 2: Error rate for the whole USPS database. 

Scatt+PCA Scatt+SVM Tangent kern. [3] humans 
2M 2M 2A 2^37 



4.2 Texture classification: CUREt 

The CureT texture database [9] includes 61 classes of image textures of iV = 
200 2 pixels, with 46 training samples and 46 testing samples in each class. 
Each texture class gives images of the same material with different pose 
and illumination conditions. Specularities, shadowing and surface normal 
variations make it challenging for classification. Figure |3] illustrates the large 
intra class variability, and also shows that the variability across classes is not 
always important. 

Classification algorithms with optimized textons have an error rate of 
5.35% [9] over this database, and the best result of 2.57% error rate was 
obtained in [16] with an optimized Markov Random Field model. 

Wavelets have been shown to be provide useful models for texture analysis 
[13] . Scattering classification results are shown in table [3], with exactly the 
same algorithm as for digit classification. With a PCA it greatly improves 
existing results with an error rate of 0.29%. The SVM classifier with an 
optimized polynomial kernel on scattering coefficients achieves a larger error 



+ In for digit 1 

J*» ° In for digit 4 

o° m *tt». * Out for digit 1 

***»»$$$$««,«,«,«„„„ o Out tor digit 4 ^ 

*** 000 ««<»««000»00<><>000 
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Figure 3: Top row: images of the same texture material with different poses 
and illuminations. Bottom row: examples of textures that are in different 
classes despite their similarities. 



Table 3: Error rate for the CUREt database 

Scatt+PCA Scatt+SVM Textons gj MRFs [16] 
(L29 L71 5T35 2^57 



rate of 1.71%. 

CureT textures have a strong variability due to illumination. For a given 
three-dimensional surface, grey level variability belong to a linear space of 
dimension of the order of 10 pQ, mostly generated by low- dimensional func- 
tions when the surface is regular. Textures are not regular three-dimensional 
surfaces, however scattering operators seem to preserve this low- dimensional 
approximation capabilities, which may partly explain the large improvements 
of linear PCA model selections relatively to SVM classifications over scatter- 
ing coefficients. 

Learning adjusts the scattering scale to J = 6 by cross-validation, which 
is much larger than for hand-written digit recognition. Choosing a large 
scale is necessary to reduce the variance of scattering estimators. Indeed, for 
a given illumination and pose, a texture can be modeled as a realization of 
a stationary process F. If F(x) is stationary, since Sj(p)F(x) is obtained 
through convolution and modulus operators, it remains stationary. Moreover 
Sj(p)F(x) = S(p)F* so 

Hjfax) = E{Sj(p)F(x)} = E{S(p)F(x)} = fx{p) , 

which does not depend upon J and x. The average scattering coefficients 
nijp) provide descriptors which discriminate stationary processes including 
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processes having the same Fourier power spectrum. For a large class of 
processes, a single texture realization has a variance 

a 2 (SjF) = J2E{\Sj(p)F(x) - 

p 

which decreases exponentially with J [12j. Figure H] shows the exponential 
decay of a 2 (SjF) as a function of J, averaged over several texture classes. 
All textures F are normalized with a zero-mean and a unit variance. This 
exponential variance reduction is due to the scattering averaging by <pj and 
to the iterated removal of random phase fluctuations by scattering modulus 
operators. The cross-validation choice of a large J results from the need to 
reach a sufficiently small variance a 2 (SjF) to optimize classification results. 

10" 1 
10" 2 

10- 3 

-4 

1 2 3 4 5 6 7 

Figure 4: Exponential decay of a 2 (SjF) in log-scale as a function of J. 

5 Conclusion 

As a result of their translation invariance and Lipschitz regularity to defor- 
mations, scattering operators provide appropriate representations to model 
complex signal classes with affine spaces calculated with a PC A. Classifi- 
cation with model selection provides state of the art results with limited 
training size sequences, for handwritten digit recognition and textures. As 
opposed to discriminative classifiers such as SVM and deep-learning convo- 
lution networks, these algorithms learn a model for each class independently 
from the others, which leads to fast learning algorithms. 

For signal classes including large rotations or scaling deformations, it is 
necessary to use scattering operators which are invariant to these large defor- 
mations. This is done with a combined scattering operator which implements 
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a wavelet transform propagator in space but also along angle parameters to 
become rotation invariant [12]. Classification of images but also of audio 
signals signals is then possible with linear afhne space models on combined 
scattering representations. 
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